AITopics | difference reward

Collaborating Authors

difference reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

Nigam, Rupal, Parikh, Niket, Osooli, Hamid, Yuasa, Mikihisa, Heglund, Jacob, Tran, Huy T.

arXiv.org Artificial IntelligenceOct-21-2025

Abstract--Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner . Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc T eaming (GPA T), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting. Ad hoc teaming (AHT) is an open challenge for multi-agent systems, in which an autonomous agent must successfully coordinate with other unknown agents [1]. Consider a search-and-rescue mission where robots are deployed from different organizations and expected to cooperate with each other on the fly--these robots may have different biases in how they achieve a given objective (e.g., risky vs. risk-averse search) or have different capabilities (e.g., sensing vs. manipulation). Adapting to such differences would enable agents to effectively and autonomously complete tasks where the team is unknown prior to deployment.

artificial intelligence, learner, teammate, (17 more...)

arXiv.org Artificial Intelligence

2510.16187

Country: North America > United States > Illinois > Champaign County (0.15)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Cooperative Search and Track of Rogue Drones using Multiagent Reinforcement Learning

Valianti, Panayiota, Malialis, Kleanthis, Kolios, Panayiotis, Ellinas, Georgios

arXiv.org Artificial IntelligenceJan-7-2025

This work considers the problem of intercepting rogue drones targeting sensitive critical infrastructure facilities. While current interception technologies focus mainly on the jamming/spoofing tasks, the challenges of effectively locating and tracking rogue drones have not received adequate attention. Solving this problem and integrating with recently proposed interception techniques will enable a holistic system that can reliably detect, track, and neutralize rogue drones. Specifically, this work considers a team of pursuer UAVs that can search, detect, and track multiple rogue drones over a sensitive facility. The joint search and track problem is addressed through a novel multiagent reinforcement learning scheme to optimize the agent mobility control actions that maximize the number of rogue drones detected and tracked. The performance of the proposed system is investigated under realistic settings through extensive simulation experiments with varying number of agents demonstrating both its performance and scalability.

agent, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2501.10413

Country: Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Credit Assignment For Collective Multiagent RL With Global Rewards

Neural Information Processing SystemsOct-7-2024, 19:56:18 GMT

The paper tackles a multi-agent credit assignment problem, an egregious problem within multi-agent systems by extending existing methods on difference rewards for settings in which the population of the system is large. Though the results are relevant and lead to an improvement for large population systems, the contribution is nonetheless limited to a modification of existing techniques for a specific setting which seemingly requires the number of agents to be large and for the agents to observe a count of the agents within their neighbourhood. The results of the paper enable improved credit assignment in the presence of noise from other agents' actions, an improved baseline leading to reduced variance and, in turn, better estimates of the collective policy gradient (under homogeneity assumptions). The analysis of the paper applies to a specific setting in which the reward function has a term that is common to all agents and therefore is not decomposable. The extent to which this property is to be found in multi-agent systems, however, is not discussed.

agent, collective multiagent rl, multi-agent system, (14 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.32)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Multirobot Coordination for Space Exploration

AI MagazineJan-4-2018, 06:03:31 GMT

You watch the feed from the onboard camera as the rover rolls along the surface, when you notice the terrain changing ahead, so you instruct the rover to turn. The problem? You're 6 minutes too late. Due to the speed-of-light delay in communication between yourself and the rover, your monolithic multimillion dollar project is in pieces at the bottom of a Martian canyon, and the nearest repairman is 65 million miles away. There are, of course, solutions to this type of problem. You can instruct it to travel a very small distance and reevaluate the rover's situation before the next round of travel, but this leads to painfully slow processes that take orders of magnitude longer than they would on Earth.

artificial intelligence, machine learning, reward, (17 more...)

AI Magazine

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Counterfactual Multi-Agent Policy Gradients

Foerster, Jakob, Farquhar, Gregory, Afouras, Triantafyllos, Nardelli, Nantas, Whiteson, Shimon

arXiv.org Artificial IntelligenceDec-14-2017

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1705.08926

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

Agent Partitioning with Reward/Utility-Based Impact

Curran, William (Oregon State University) | Agogino, Adrian (NASA Ames Research Center) | Tumer, Kagan (Oregon State University)

AAAI ConferencesMar-1-2015

Reinforcement learning with reward shaping is a well established but often computationally expensive approach to large multiagent systems. Agent partitioning can reduce this computational complexity by treating each partition of agents as an independent problem. We introduce a novel agent partitioning approach called Reward/Utility-Based Impact (RUBI). RUBI finds an effective partitioning of agents while requiring no prior domain knowledge, improves performance by discovering a non-trivial agent partitioning, and leads to faster simulations. We test RUBI in the Air Traffic Flow Management Problem (ATFMP), where there are tens of thousands of aircraft affecting the system and no obvious similarity metric between agents. When partitioning with RUBI in the ATFMP, there is a 37% increase in performance, with a 510x speed increase over non-partitioning approaches. Additionally, RUBI matches the performance of the current domain-dependent ATFMP gold standard using no prior knowledge and with 10% faster performance.

agent, artificial intelligence, machine learning, (18 more...)

AAAI Conferences

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Transportation > Air (0.89)
Transportation > Infrastructure & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multiagent Learning with a Noisy Global Reward Signal

Proper, Scott (Oregon State University) | Tumer, Kagan (Oregon State University)

AAAI ConferencesJul-9-2013

Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function approximation that allows the quick computation of local rewards. We demonstrate how this model can result in significant improvements in behavior for three congestion problems: a multiagent ``bar problem'', a complex simulation of the United States airspace, and a generic air traffic domain. We show how the model of the global reward may be either learned on- or off-line using either linear functions or neural networks. For the bar problem, we show an increase in reward of nearly 200% over learning using the global reward directly. For the air traffic problem, we show a decrease in costs of 25% over learning using the global reward directly.

agent, artificial intelligence, difference reward, (18 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)

Add feedback